Home > mailing lists

When pg_rewind success, the database can't startup - Mailing list pgsql-bugs

From	hemin
Subject	When pg_rewind success, the database can't startup
Date	June 14, 2018 12:30:20
Msg-id	D73AFF2B-7325-4047-A325-F4B70828023B@ww-it.cn>+554133D5E253EE07 Whole thread Raw
List	pgsql-bugs

Tree view

Dear PGer:

I use pg_rewind to avoid the WAL diverged success, but the database can’t startup, and output error “requested timeline 3 does not contain minimum recovery point 0/DB35BE80 on timeline 1”. Fallow is the detail.

Thanks !

Problem Description:

There is a primary standby cluster with async replication, when large data inserting into the primary node, we stop the database by hand. Then promote the standby node to be new primary node and insert new data into it. Finally use pg_rewind to avoid WAL diverged success, but the node can not to be startup with fallow error:

“2018-06-06 14:40:18.686 CST [2687] FATAL: requested timeline 3 does not contain minimum recovery point 0/DB35BE80 on timeline 1

2018-06-06 14:40:18.686 CST [2686] LOG: startup process (PID 2687) exited with exit code 1”

Environment: primary standby cluster with async replication, the database version is postgresql-10

Primary Node Info:

System: centos 6, IP:10.9.5.21, port 5410

Standby Node Info:

System: centos 6, IP:10.9.5.22, port: 5410

Reproduce Step:

(1) Init environment: Create a primary standby cluster with async replication, and add access role in pg_hba.conf which tool pg_rewind will be use;

(2) Primary Node: insert 1,500,000 rows data into database use pgbench:

pgbench -i -s 15 postgres

(3) Primary Node: when pgbench is insert end, and begin vacuum the database, we stop the database by hand:

pg_ctl -D $PGDATA stop

(4) Standby Node: promote the standby node to be primary:

pg_ctl -D $PGDATA promote

(5) Standby Node: inset 3,000,000 rows data into database use pgbench to:

pgbench -i -s 30 postgres

(6) Primary Node: use pg_rewind to avoid WAL diverged,:

pg_rewind --target-pgdata='/var/lib/pgsql/10/data' --source-server='host=10.9.5.22 port=5410 dbname=postgres user=postgres password=xxx’

servers diverged at WAL location 0/AEEE94D0 on timeline 1

rewinding from last common checkpoint at 0/AEEE9460 on timeline 1

Done!

(7) Primary Node: startup failed:

pg_ctl -D $PGDATA start

waiting for server to start....2018-06-06 14:40:18.194 CST [2686] LOG: listening on IPv4 address "0.0.0.0", port 5410

2018-06-06 14:40:18.194 CST [2686] LOG: listening on IPv6 address "::", port 5410

2018-06-06 14:40:18.256 CST [2686] LOG: listening on Unix socket "/tmp/.s.PGSQL.5410"

2018-06-06 14:40:18.372 CST [2687] LOG: database system was interrupted while in recovery at log time 2018-06-06 14:12:45 CST

2018-06-06 14:40:18.372 CST [2687] HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.

2018-06-06 14:40:18.686 CST [2687] LOG: entering standby mode

2018-06-06 14:40:18.686 CST [2687] FATAL: requested timeline 3 does not contain minimum recovery point 0/DB35BE80 on timeline 1

2018-06-06 14:40:18.686 CST [2686] LOG: startup process (PID 2687) exited with exit code 1

2018-06-06 14:40:18.686 CST [2686] LOG: aborting startup due to startup process failure

2018-06-06 14:40:18.690 CST [2686] LOG: database system is shut down

stopped waiting

pg_ctl: could not start server

Examine the log output.

何敏

Call: 185.0821.2027 | Fax: 028.6143.1877 | Web: w3.ww-it.cn

成都文武信息技术有限公司|ChengDu WenWu Information Technology Inc.|WwIT

地址：成都高新区天府软件园B区7栋611 |邮编：610041

pgsql-bugs by date:

From: Michael Paquier
Date: 14 June 2018, 05:14:51
Subject: Re: psql crashes found when executing slash commands

From: Flo Rance
Date: 14 June 2018, 13:05:44
Subject: Re: BUG #15240: JDBC driver sometimes hangs on copy out; suspect Json

When pg_rewind success, the database can't startup - Mailing list pgsql-bugs

Previous

Next